Skip to content

Add IBM Cloud infrastructure support#335

Open
sayalibhavsar wants to merge 6 commits intomainfrom
test_upload_fix1
Open

Add IBM Cloud infrastructure support#335
sayalibhavsar wants to merge 6 commits intomainfrom
test_upload_fix1

Conversation

@sayalibhavsar
Copy link
Contributor

Description

This PR adds automated IBM Cloud VSI provisioning with Terraform, including VPC networking, security, floating IPs, and new Ansible roles with burden script integration.

Key capabilities added:

  • Automated IBM Cloud infrastructure provisioning via Terraform
  • Support for running multiple instance types in parallel without naming conflicts
  • VPC creation with optional reuse of existing VPCs
  • Automated SSH key detection and validation
  • Resource cleanup utility for systematic deletion of IBM Cloud resources
  • Error handling for failed provisioning and terraform state issues

Before/After Comparison

IBM Cloud Not Supported, No Parallel Multi-Instance Support, Manual Resource Management:

Documentation Check

Clerical Stuff

This closes #310

Relates to JIRA: RPOPC-677

@sayalibhavsar sayalibhavsar self-assigned this Dec 2, 2025
@sayalibhavsar sayalibhavsar marked this pull request as ready for review January 10, 2026 08:03
@sayalibhavsar
Copy link
Contributor Author

New Ansible Roles

  • ibm_create - Basic IBM Cloud VSI provisioning with VPC networking
  • ibm_vpc_create - Advanced IBM VPC provisioning with disk and network management

Infrastructure as Code

  • Terraform configurations for IBM Cloud provider (~1.70)
  • VPC creation with optional reuse of existing VPCs
  • Subnet management (public and private)
  • Security group rules (all inbound/outbound for testing)
  • Floating IP allocation for public access
  • SSH key integration with IBM Cloud
  • Multi-network interface support
  • Data volume creation and attachment
  • Pbench benchmark volume support

Core Script Updates (bin/burden)

  • Added ibm to valid system types
  • ibm_image_lookup() - Query available OS images from IBM Cloud
  • ibm_specific_os_version() - Handle IBM Cloud image IDs
  • IBM Cloud region/zone defaults (us-south with zones 1-3)
  • Default test user set to root for IBM instances
  • Bug fix: user_parent_home_dir logic moved after user determination (handles /root vs /home/root)

Automation Playbook Updates (bin/ten_of_us.yml)

  • IBM instance creation workflow integration
  • CodeReady Builder repository enablement for IBM RHEL instances
  • IBM Cloud cleanup/teardown in terraform delete logic

Authentication & Validation

  • API key requirement validation (IC_API_KEY or IBMCLOUD_API_KEY)
  • SSH key auto-detection (prioritizes "zathras" keys, then username keys)
  • Resource group ID retrieval from IBM Cloud CLI
  • Error messages with setup instructions

Configuration Templates

  • Package dependencies added to all 11 test templates:
    • coremark, coremark_pro
    • fio, iozone
    • passmark, phoronix
    • pig, pyperformance
    • speccpu2017, specjbb
    • uperf
  • RHEL, Ubuntu, and Amazon Linux package lists

Documentation

  • Comprehensive 304-line README for IBM Cloud integration
  • Prerequisites and setup instructions
  • Usage examples with different configurations
  • Troubleshooting guide
  • IBM Cloud-specific limitations documented

Key Capabilities

  • Automated infrastructure provisioning via Terraform
  • Parallel multi-instance support without naming conflicts
  • Multiple network interfaces for network performance tests
  • Storage volume attachment for disk I/O benchmarks
  • Region/zone selection and defaults
  • Resource cleanup utilities
  • Error handling for failed provisioning
  • Metadata recording for test results

Files Added (38 new files)

  • 1 README
  • 18 Terraform configuration files (.tf)
  • 13 Ansible task files (.yml)
  • 3 Jinja2 templates (.j2)
  • 1 vars file
  • 2 modified core scripts

Files Modified (14 existing files)

  • bin/burden (main CLI script)
  • bin/ten_of_us.yml (main playbook)
  • 11 test configuration templates
  • 1 duplicate exec_dir line in specjbb_template.yml (line 2284)
    The implementation follows the same patterns as existing AWS/Azure/GCP providers, ensuring consistency across the codebase.

Copy link
Contributor

@dvalinrh dvalinrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first pass. Various items (expected, large amount of changes).

- Move API key check from ansible role to burden

- Exit if no SSH key found in IBM Cloud

- Use wait_for_ssh role instead of inline wait

- Extract user readiness check into reusable wait_for_user_ready role

- Revert config file changes (removed re-added pkg lines)

- Fix user_parent_home_dir: move after set_user_name to handle root user

- Change delete_volume_on_instance_delete to true in disks.tf

- Remove wrong CPU type check from ibm_vpc_create

- Remove all pbench references

- Remove hardcoded IBM VPC profiles vars file

- Use in list syntax in ten_of_us.yml

- Simplify README API key documentation
retries: "{{ ansible_ssh_retries | default(5) }}"
delay: "{{ item.0 * 2 + 3 }}"
with_indexed_items:
- "{{ ip_list }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user account never becomes ready we need to exit out. Note, make sure every place you are exiting out, will terminate the system if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added block/rescue to wait_for_user_ready so when retries are exhausted it calls terminate_on_error to destroy the cloud instance before aborting. Also updated the public IP check in
ibm_create to use terminate_on_error instead of a bare fail, since infrastructure already exists at that point

bin/burden Outdated
{
# Validate IBM Cloud API key is set - Terraform requires it
if [[ -z "$IC_API_KEY" ]] && [[ -z "$IBMCLOUD_API_KEY" ]]; then
cleanup_and_exit "Error: IBM Cloud API key not set. Export IC_API_KEY or IBMCLOUD_API_KEY" 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This info should be provided by the scenario file (think CPT pipeline). You may have it as an exported variable from the shell or via the scenario file. If from the scenario file, we can always issue an export if that is what the IBM cloud requires, else dump it into the ansible file and use it from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added --ibm_api_key as a CLI/scenario file option that gets exported as IC_API_KEY for Terraform, while still supporting the existing environment variable approach.

Copy link
Contributor

@dvalinrh dvalinrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user account never becomes ready we need to exit out. Note, make sure every place you are exiting out, will terminate the system if required.

@sayalibhavsar
Copy link
Contributor Author

@github-actions
Copy link

PullHero Review

Provider: generativelanguage.googleapis.com Model: gemini-2.0-flash

Okay, I've reviewed the changes and here's my assessment.

Overview

This PR introduces support for IBM Cloud in the Zathras test automation framework. This includes a new Ansible role (ibm_create), associated Terraform files, and updates to existing Ansible playbooks to incorporate IBM Cloud provisioning. The changes are extensive, adding significant new functionality to Zathras.

Analysis

General Design and Architecture

  • Plug-in Architecture: The IBM Cloud integration appears to adhere to Zathras's plug-in architecture, with a dedicated Ansible role and Terraform files. This aligns with the goal of modular system type support.
  • Configuration: The reliance on environment variables (specifically IC_API_KEY or IBMCLOUD_API_KEY) for IBM Cloud authentication is standard practice. However, it's crucial to ensure that these variables are handled securely and documented clearly.
  • Resource Management: The role seems to handle basic resource management (VPCs, subnets, security groups, floating IPs) within IBM Cloud. It's important to verify that resources are properly tagged and cleaned up after testing.

Code Quality and Maintainability

  • Ansible Roles: The Ansible tasks are generally well-structured, with clear separation of concerns (e.g., record_ip_info.yml, add_host_to_groups.yml). However, some tasks could benefit from improved error handling and more descriptive names.
  • Terraform: The Terraform configurations appear to be standard and follow best practices. However, it's important to verify that the resources created are properly parameterized and configurable.
  • Templates: The Jinja2 templates (tfvars.j2) seem to be well-organized and easy to understand. However, it's important to ensure that all variables are properly escaped and validated.

Potential Issues and Concerns

  • Security: The security group rules in main.tf allow all inbound and outbound traffic (0.0.0.0/0). This is generally acceptable for testing purposes but should be clearly documented and potentially made configurable.
  • Error Handling: Several tasks use ignore_errors: yes without proper error handling. This can mask underlying issues and make debugging difficult. It's important to add appropriate error handling and logging to these tasks.
  • Resource Group Management: The role automatically uses the first available resource group. This behavior should be documented and potentially made configurable, as users may want to specify a specific resource group.
  • SSH Key Management: The role requires SSH keys to be pre-created in IBM Cloud. This limitation should be clearly documented, and the role should provide guidance on how to create and manage SSH keys.
  • Lack of Spot Instance Support: The README mentions that IBM Cloud doesn't have spot instances. This limitation should be clearly documented, and the role should provide a fallback mechanism or alternative solution.
  • Dependency Management: The README lists several dependencies (IBM Cloud CLI, Terraform, jq). It's important to ensure that these dependencies are properly managed and documented.
  • "Placement Group" vs "Proximity Placement Group": The role uses "Placement Group" in AWS and "Proximity Placement Group" in Azure, but seems to call everything "Placement Group" in IBM. This is inconsistent. This should be Proximity Placement Group throughout.
  • Missing wait_for_user_ready role: This role is added, but there's no equivalent role in the other clouds. This is either needed in the other clouds, or should have a very good reason for not being there (and that reason should be documented).
  • No IBM Cloud specific variables in config_info: When other clouds are used, there are variables that are in the config_info section, there are none for IBM.

Documentation Completeness

  • README: The README provides a good overview of the IBM Cloud integration. However, it could benefit from more detailed explanations of the configuration variables, resource management, and troubleshooting steps.
  • Code Comments: The code is generally well-commented. However, some tasks could benefit from more descriptive comments, especially in complex or critical sections.

Specific Feedback

  • ansible_roles/roles/ibm_create/files/tf/main.tf:
    • The security group rules should be configurable and more restrictive. Consider adding variables for allowed inbound/outbound ports and IP ranges.
    • The resource_group variable description should be updated to reflect that the role automatically uses the first available resource group.
  • **ansible_roles/roles/ibm_create/tasks/main.yml

Vote: 0

PullHero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add IBM Cloud VSI and Bare Metal Provisioning Support to Zathras Framework

2 participants